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A virtual learning environment (VLE) is an online learning platform that 
allows many students, even millions, to study according to their interests 
without being limited by space and time. Online learning environments have 
many benefits, but they also have some drawbacks, such as high dropout 
rates, low engagement, and students' self-regulated behavior. Evaluating and 
analyzing the students' data generated from online learning platforms can 
help instructors to understand and monitor students learning progress. In this 
study, we suggest a predictive model for assessing student success in online 
learning. We investigate the effect of hyperparameters on the prediction of 
student learning outcomes in VLEs by the long short-term memory (LSTM) 
model. A hyperparameter is a parameter that has an impact on prediction 
results. Two optimization algorithms, adaptive moment estimation (Adam) 
and Nesterov-accelerated adaptive moment estimation (Nadam), were used 
to modify the LSTM model's hyperparameters. Based on the findings of 


research done on the optimization of the LSTM model using the Adam and 
Nadam algorithm. The average accuracy of the LSTM model using Nadam 
optimization is 89%, with a maximum accuracy of 93%. The LSTM model 
with Nadam optimisation performs better than the model with Adam 
optimisation when predicting students in online learning. 
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1. INTRODUCTION 

The design and development of virtual learning environments (VLE) and learning management 
systems (LMS), as well as other online learning platforms, have rapidly improved, eliminating not only the 
constraints of time and place but also lowering the cost and facilitating access to education. Evaluating and 
analyzing the students’ data generated from online learning platforms can help instructors to understand and 
monitor students learning progress [1]. The earlier the students' performance is detected in the VLEs, the 
better it is for the instructor to persuade and warn students for keeping them on the right track. Therefore, it is 
challenging to create a predictive model that can precisely identify students' in-course learning behaviors by 
looking at behavior data. 

In previous research, machine learning (ML) techniques have been extensively used in the 
development of predictive models to illustrate student learning behavior in VLE [2]—[6]. However, there are 
some limitations to the use of ML techniques in the development of predictive models. For example, there 
are limitations on the features selected and the ML models that are used [4]-[8]. The advancement of deep 
learning methodologies will allow prediction models to perform more accurately [9]-[13]. In an online 
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learning environment where a lot of data is produced every day. One of the best deep learning algorithms for 
handling issues with time series data is long short-term memory (LSTM) [14], [15]. 

The LSTM architecture is an enhanced recurrent neural network (RNN) that works well for long- 
term dependability in time series sequential data [16]. There are many hyperparameters available for LSTMs, 
including learning rates, the number of hidden units, input length, and batch sizes [17], [18]. 
Hyperparameters are parameters that are specifically defined to regulate how the model learns [19]. The 
model's output is significantly impacted by its hyperparameters [20]. Determining the right combination of 
models and hyperparameters is often a challenge. We want to investigate how hyperparameters affect LSTM. 
Hyperparameter selection and optimization frequently distinguish the outcomes from model accuracy. To 
fine-tune the hyperparameters, we used the adaptive moment estimation (Adam) and Nesterov-accelerated 
adaptive moment estimation (Nadam) optimization algorithms. Adam and Nadam, are the two most effective 
gradient descent optimization algorithms [21], [22]. 

The following is a review of a number of prior research studies that addressed the use of the LSTM 
algorithm to forecast online learning. The attention-based multi-layer (AML) LSTM, which combines 
clickstream data and student demographic data for thorough analysis, is suggested in this article [23] as a 
method for predicting students. The outcomes demonstrate that, from week 5 to week 25, the proposed model 
can increase accuracy for the four-class classification task by 0.52% to 0.85%. According to Alsabhan [24], 
the LSTM model performs better in terms of accuracy for the prediction of withdrawal in a VLE than both 
the logistic regression algorithm and neural networks. When detecting student cheating in higher education, 
LSTM with dropout layers, dense layers, and Adam optimizer [25] achieves 90% better accuracy than ML 
algorithms. 

The LSTM model was improved in [26] research for predicting student performance using the Adam 
and root mean square propagation (RMSprop) algorithms. When compared to the RMSprop algorithm, the 
LSTM model with Adam's algorithm performs better. According to Bock and Weif [27], Adam and Nadam 
outperformed adaptive learning rate delta (AdaDelta), adaptive gradient descent (AdaGrad), or RMProp in 
terms of setting optimization parameters, as determined by the perceptual loss function and visual perception. 
In this study, the Adam and Nadam optimisation algorithm was used to test the LSTM algorithm model in 
order to determine the algorithm's optimal performance. 

We suggest an LSTM algorithm model for predicting student learning outcomes in a VLE that has 
been improved with Adam and Nadam. The Adam and Nadam optimization algorithm is used to test each 
model. Then, the accuracy, recall, precision, and Fl-score of each model are assessed in order to compare the 
outcomes. A stochastic gradient descent technique called Adam optimization is based on the adaptive 
estimation of first and second-order moments [28]. When dealing with complex problems involving a large 
number of variables or data, the method is incredibly effective. Adam is a fusion of the 'gradient descent with 
momentum algorithm’ and the 'RMSprop' algorithm. The Adam and RMSprop methods have their respective 
strengths, and Adam optimizer builds on those strengths to produce a gradient descent that is more optimized. 

The Nadam algorithm is a sophisticated gradient descent optimization method that raises the quality 
and convergence rate of neural networks [29]. Nadam alters the momentum component of Adam while 
maintaining an adaptive learning rate that is a pure amalgamation of Adam and Nesterov's accelerated 
gradient (NAG). Nadam converges faster and outperforms NAG and Adam on some types of data sets. Our 
research makes use of two hyperparameter optimization algorithms specifically Adam and Nadam. The 
parameters that we use to construct the LSTM model include learning rates, the number of hidden units, the 
length of input, batch sizes, and dropout. The following queries are what this essay aims to address: i) RQ1: 
how do hyperparameter optimization techniques LSTM as well as compare with each other? and ii) RQ2: 
which LSTM model is the most effective after assessing how well the optimization method worked? 


2. METHOD 

The research methodology used to compare the LSTM model with the gradient descent optimization 
method in order to forecast student performance in a VLE is shown in Figure 1. The initial stages of a 
research project are data gathering, data comprehension, and data processing [30]. Afterward, carry out the 
data preparation for the LSTM models. The data is separated into training, validation, and testing data. 


2.1. Datasets 

This study makes use of the open university VLE dataset. The open university learning analytics 
dataset (OULAD) dataset that was acquired includes the demographic information, login patterns, and 
assessment behavior of 32,593 students over the course of nine months. It consists of seven modules, or 
courses, each of which is taught at least twice a year at different times. The student performances are broken 
down into four groups, with 9% receiving distinctions, 38% receiving passes, 22% receiving failures, and 
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31% discontinuing their studies. The acquired raw data set consists of files that contain data on student 
demographics, clickstream data that shows how students interact with the online environment, assessments, 
quiz results, and module information. 

Data about both students and courses are included in the dataset. The OULAD dataset contains data 
for seven courses. Data from the course BBB were the subject of our study. BBB is the course code. A total 


of 7,909 students are enrolled in the course's focus on social sciences, which has the highest enrollment of 
any other subject. 
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Figure 1. The phases of the research methodology used 


2.2. Preparation of data 

Data preparation is the collection, combination, cleaning, and transformation of raw data for ML 
projects in order to make accurate predictions. The dataset is preprocessed to select the BBB course features 
that will be used to train and test the model. The features that have been chosen and will be put to use are the 
module code, presentation code, student ID, clicks, assignment assessment, average assignment assessment, 
and final results. 

There are 1,565,580 lines of BBB courses after preprocessing. There are two presentation codes or 
semester codes in the BBB course: "B" begins in February, while "J" begins in October. The presentation 
code used in the BBB course is shown in Figure 2. The data for the BBB course are divided: 60% for 
training, 20% for validation, and 20% for testing. 
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Figure 2. The BBB course's presentation code 
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2.3. The architecture of the designed long short-term memory model 

LSTM is one of the RNN variants [14]. LSTM fills the gap left by RNN's inability to predict words 
based on previously learned information that has been stored for a long time. The fundamental distinction 
between LSTM and RNN architectures is that the hidden layer of the LSTM is a gated unit or gated cell [15]. It 
is made up of four layers that work together in some way to produce both the cell state and the output of that 
cell. Then, these two items are transferred to the following hidden layer. In contrast to RNNs, which only have 
one tanh layer, LSTMs have three logistic, sigmoid gates, and one tanh layer. 

The LSTM model, which was created to predict a VLE, makes use of three input layers, two output 
layers with one node each and sigmoid activation functions, one hidden layer with sixteen nodes, and a 
hyperbolic tangent activation function to solve the non-linear function. Then, to enhance the LSTM model, a 
dropout layer with a 50% setting in each training step is included. The LSTM model was trained using batch 
size 32, with the back-propagation method. Figure 3 displays a design for the LSTM architecture. 
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Figure 3. The architecture of the designed LSTM model 


2.4. Optimization algorithms using gradient descent 

The process of gradient descent is used to enhance neural network models [31]. The Adam and 
Nadam algorithms were used in this study as gradient-based optimization algorithms. The gradient descent 
algorithm requires that both the target function and its derivative function be optimized. The gradient descent 
optimization algorithm used in the study is as: 


2.4.1. Adam optimizer 

In contrast to the more traditional stochastic gradient descent approach, Adam is an optimization 
algorithm that can be used to iteratively update weights based on training data [21], [28]. Adam can be 
characterized as a stochastic gradient descent with momentum and the RMSprop model. Adam is a technique 
of the adaptive learning rate that lowers individual learning rates for various parameters. 


2.4.2. Nadam optimizer 
The NAG and Adam algorithms were combined to create the Nadam algorithm [22], [29]. Nadam 


performs a momentum update for the value of 77, [32]. The update rule has the following format: 


n a (1- Brg 
Or44 = 8, - Jerre (Bim, + =) (1) 
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2.5. Performance evaluation of the model 

The model's effectiveness was measured using a confusion matrix, accuracy (CA), precision, recall, 
and Fl-score (F1) [33]. The confusion matrix depicts the present state of the dataset as well as the number of 
accurate and wrong model predictions [34]. The proportion of accurate predictions to all predictions is 
measured by accuracy, which is a crucial and intuitive metric. Precision measures the percentage of correctly 
predicted positive outcomes to the total number of correctly predicted positive outcomes. The recall is the 
ratio of true positive predictions compared to the total number of true positive data. A weighted comparison 
of the average precision and recall is called an F1-score (F1). 


>¥ True positives(TP)+> True negatives(TN) 


Accuracy = 2 

y x Total population ( ) 

Seeal = True positives (TP) (3) 

mn >» True positives(TP)+ > False negatives(FN) 
Foot d True positives(TP) 

Precission = ————JY$+$#+oo@2@—+~——\"————— 4 

> False positives(FP)+ > True positives(TP) ( ) 
Recall >| Precission 
F1 Score = 2*~ z (5) 


¥ Recall+¥; Precission 


Formally, positives denote students who really fail, whereas negatives denote students who actually pass, 
while true denotes a valid prediction, and false denotes an incorrect forecast. A true positive value is TP, a 
true negative value is TN, a false negative value is FN, and a false positive value is FP. Table 1 illustrates the 
confusion matrix associated with various combinations of actual and predicted. 


Table 1. The confusion matrix 


cewal YT Predicted ; 
Positive (1) _ Negative (0) 
Positive (1) TP FP 
Negative (0) EN TN 


3. RESULTS AND DISCUSSION 

In this study, Python programming was used for model training and testing. The architectural 
performance parameters were developed using 10 different combinations, and validation tests were carried 
out from 20% of the training dataset samples. The Adam and Nadam optimization algorithms were used to 
refine the models’ hyperparameters. The following provides an explanation of the outcomes of the LSTM 
models' performance assessment. 


3.1. Performance analysis of the long short-term memory model 

We assess the accuracy, recall, precision, and Fl-score of the Adam and Nadam algorithm- 
optimized LSTM model's performance. We compare the outcomes of our model performances to determine 
which is the best. The LSTM and Adam models were tested and trained in our first experiment. The second 
experiment went on to train and test the LSTM and Nadam models. Table 2 displays the measurement 
outcomes of the LSTM model with hyperparameter settings applied using Adam's algorithm. 


Table 2. LSTM model results with Adam optimisation 


LSTM+Adam optimizer 
Decile Accuracy Recall Precision Fl-score 
0 0.75 0.76 0.57 0.65 
1 0.60 0.60 0.67 0.62 
2 0.63 0.64 0.67 0.65 
3 0.71 0.72 0.71 0.72 
4 0.78 0.78 0.79 0.78 
5 0.82 0.82 0.82 0.82 
6 0.87 0.88 0.88 0.87 
7 0.88 0.88 0.88 0.88 
8 0.89 0.90 0.90 0.89 
9 0.91 0.92 0.92 0.92 
10 0.92 0.92 0.92 0.92 
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The average accuracy value for the LSTM model using Adam's algorithm optimization is 87%, with 
the lowest accuracy value being obtained at 60%, and the highest accuracy value being obtained at 92%. 
The highest recall value is 92%, the lowest recall value is 60%, and the average recall value is 88%. Table 3 
displays the measurement outcomes of the LSTM model with hyperparameter settings using Nadam's 
algorithm. 

The average accuracy value for the LSTM model using the Nadam algorithm optimization is 89%, 
with the highest accuracy value obtained being 93%, and the lowest accuracy value obtained being 60%, 
according to experimental results. The average recall percentage is 89%, with the lowest recall percentage 
being 60% and the highest recall percentage being 93%. We visualize the accuracy results of the LSTM- 
Adam and LSTM-Nadam models and compare them. Figure 4 displays the performance visualization of the 
measurement outcomes from the LSTM model. The results of the analysis show that the LSTM-Nadam 
model outperforms the LSTM-Adam model in a number of accuracy domains. 


Table 3. LSTM model results with Nadam optimisation 


LSTM+Nadam optimizer 
Decile Accuracy Recall Precision Fl-Score 
0 0.75 0.76 0.57 0.65 
1 0.60 0.60 0.66 0.62 
2 0.72 0.72 0.71 0.72 
} 0.71 0.71 0.72 0.71 
4 0.78 0.78 0.78 0.78 
5 0.82 0.82 0.82 0.82 
6 0.88 0.88 0.87 0.87 
i 0.88 0.88 0.88 0.88 
8 0.90 0.90 0.90 0.89 
9 0.92 0.92 0.92 0.91 
10 0.93 0.93 0.92 0.92 
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Figure 4. Comparison of the LSTM models’ level of accuracy 


3.2. Results from the long short-term memory model for prediction 

Evaluation of the LSTM model's performance in foretelling final student data ina VLE. A total of 
1,521 records from testing data are used to evaluate the LSTM model. Table 4 displays the outcomes of the 
LSTM model prediction using the Adam optimization algorithm. 

The LSTM model's classification results used the Adam optimization algorithm, which produced the 
best classification outcomes; in the decile 0 data, 1,149 students were correctly categorized under the pass 
category. In addition, 369 data have classification results that are incorrect but still pass, despite the fact that 
they do not pass. The classification of students who actually failed was zero, this is in accordance with the 
actual data. There were three instances where data on students who did not pass were classified incorrectly 
and were actually students who did pass. 

The same data testing is used in the LSTM model's prediction using the Nadam optimization 
algorithm. In Table 5, the outcomes of the Nadam optimization algorithm's prediction of the LSTM model 
are displayed. The LSTM model with the Nadam optimization algorithm has some higher accuracy values. 
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Table 4. LSTM model prediction outcomes using Adam optimization 


Models pce Predicted oer pease Accuracy 
LSTM+Adam optimizer Actual Fail 0 369 75% 
Pass 3 1,149 
Decile 1 Fail Pass 
Actual Fail 164 205 60% 
Pass 402 750 
Decile 2 Fail Pass 
Actual Fail 147 222 63% 
Pass 333 819 
Decile 3 Fail Pass 
Actual Fail 148 221 11% 
Pass 209 943 
Decile 4 Fail Pass 
Actual Fail 208 161 718% 
Pass 167 985 
Decile 5 Fail Pass 
Actual Fail 227 142 82% 
Pass 130 1,022 
Decile 6 Fail Pass 
Actual Fail 201 168 87% 
Pass 18 1,134 
Decile 7 Fail Pass 
Actual Fail 266 103 88% 
Pass 77 1,075 
Decile 8 Fail Pass 
Actual Fail 260 109 89% 
Pass 44 1,110 
Decile 9 Fail Pass 
Actual Fail 283 86 91% 
Pass 37 1,115 
Decile 10 Fail Pass 
Actual Fail 283 109 92% 
Pass 39 1,113 


Table 5. LSTM model prediction outcomes using Nadam optimization 


Models eae Predicted ees Accuracy 
LSTM+Nadam optimizer Actual Fail 0 369 75% 
Pass 0 1,152 
Decile 1 Fail Pass 
Actual Fail 166 203 60% 
Pass 411 741 
Decile 2 Fail Pass 
Actual Fail 135 234 712% 
Pass 188 964 
Decile 3 Fail Pass 
Actual Fail 159 210 71% 
Pass 227 925 
Decile 4 Fail Pass 
Actual Fail 209 160 718% 
Pass 175 977 
Decile 5 Fail Pass 
Actual Fail 220 149 82% 
Pass 123 1,029 
Decile 6 Fail Pass 
Actual Fail 223 146 88% 
Pass 42 1,110 
Decile 7 Fail Pass 
Actual Fail 257 112 88% 
Pass 72 1,080 
Decile 8 Fail Pass 
Actual Fail 260 109 90% 
Pass 20 1,132 
Decile 9 Fail Pass 
Actual Fail 272 97 92% 
Pass 30 1,112 
Decile 10 Fail Pass 
Actual Fail 260 109 93% 
Pass 11 1,141 


An LSTM-based prediction model for gradient-descending optimization in virtual ... (Edi Ismanto) 


206 im) ISSN: 2722-3221 


The LSTM model with the Nadam optimization algorithm generates the best classification outcomes 
at decile 0 by classifying the 1,152 passing students. Additionally, 369 data have classification outcomes that 
are inaccurate but still pass even though they do not pass. The classification of students who actually failed 
was zero, this is in accordance with the actual data. There are 0 students who do not pass and are correctly 
classified. 


4. CONCLUSION 

Based on analysis done to categorize student performance in a VLE using the LSTM model 
optimized with the Adam and Nadam optimization algorithm. The average accuracy of the LSTM model 
using Nadam optimization is 89%, with a maximum accuracy of 93%, while Adam's optimization-based 
LSTM model has a maximum accuracy of 92% and an average accuracy of 87%. The LSTM model with the 
Nadam optimization algorithm performs better than Adam's optimization algorithm in the prediction problem 
for VLE. The contribution of this study is the performance improvement of the LSTM model through 
hyperparameter optimization using the Adam and Nadam algorithm, which can be used as a reference when 
developing prediction systems based on LSTM. For further research and development, testing can be done 
using the meta-heuristic optimization algorithm and assessing the performance of the resulting model. 
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